corollary 4
- North America > United States (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
A Proofs
Section A.1 presents the lemmas used to prove the main results. Section A.2 presents the main results The first two inequalities are owing to the triangle inequality, and the third inequality is due to the definition of L-divergence Eq.(5). We complete the proof by applying Lemma A.1 to bound F ollowing the conditions of Theorem 4.1, the upper bound of null V arnull null D Based on the conditions of Theorem 4.1, we assume We complete the proof by applying Lemma A.3 and Lemma A.4 to bound the Rademacher Following the proof of Theorem 4.1, we have |D F ollowing the conditions of Proposition 4.3, as N, we have, null D Based on the result on Proposition 4.3, for any δ (0, 1), we know that 4LB ( 2 D ln 2 + 1)null We complete the proof by applying the triangle inequality. III: Samples from p and q are labeled with 0 and 1, respectively. All values are averaged over five trials.
Labels or Preferences? Budget-Constrained Learning with Human Judgments over AI-Generated Outputs
Dong, Zihan, Wu, Ruijia, Zhang, Linjun
The increasing reliance on human preference feedback to judge AI-generated pseudo labels has created a pressing need for principled, budget-conscious data acquisition strategies. We address the crucial question of how to optimally allocate a fixed annotation budget between ground-truth labels and pairwise preferences in AI. Our solution, grounded in semi-parametric inference, casts the budget allocation problem as a monotone missing data framework. Building on this formulation, we introduce Preference-Calibrated Active Learning (PCAL), a novel method that learns the optimal data acquisition strategy and develops a statistically efficient estimator for functionals of the data distribution. Theoretically, we prove the asymptotic optimality of our PCAL estimator and establish a key robustness guarantee that ensures robust performance even with poorly estimated nuisance models. Our flexible framework applies to a general class of problems, by directly optimizing the estimator's variance instead of requiring a closed-form solution. This work provides a principled and statistically efficient approach for budget-constrained learning in modern AI. Simulations and real-data analysis demonstrate the practical benefits and superior performance of our proposed method.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Oceania > New Zealand (0.04)
- Europe > Bulgaria > Sofia City Province > Sofia (0.04)
- (2 more...)
Error analysis for the deep Kolmogorov method
Cîmpean, Iulian, Do, Thang, Gonon, Lukas, Jentzen, Arnulf, Popescu, Ionel
The deep Kolmogorov method is a simple and popular deep learning based method for approximating solutions of partial differential equations (PDEs) of the Kolmogorov type. In this work we provide an error analysis for the deep Kolmogorov method for heat PDEs. Specifically, we reveal convergence with convergence rates for the overall mean square distance between the exact solution of the heat PDE and the realization function of the approximating deep neural network (DNN) associated with a stochastic optimization algorithm in terms of the size of the architecture (the depth/number of hidden layers and the width of the hidden layers) of the approximating DNN, in terms of the number of random sample points used in the loss function (the number of input-output data pairs used in the loss function), and in terms of the size of the optimization error made by the employed stochastic optimization method.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
- Asia > China > Hong Kong (0.04)
- (4 more...)
PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference
Generalized linear models (GLMs)--such as logistic regression, Poisson regression, and robust regression--provide interpretable models for diverse data types. Probabilistic approaches, particularly Bayesian ones, allow coherent estimates of uncertainty, incorporation of prior information, and sharing of power across experiments via hierarchical models. In practice, however, the approximate Bayesian methods necessary for inference have either failed to scale to large data sets or failed to provide theoretical guarantees on the quality of inference. We propose a new approach based on constructing polynomial approximate sufficient statistics for GLMs (P ASS-GLM). We demonstrate that our method admits a simple algorithm as well as trivial streaming and distributed extensions that do not compound error across computations. We provide theoretical guarantees on the quality of point (MAP) estimates, the approximate posterior, and posterior mean and uncertainty estimates. We validate our approach empirically in the case of logistic regression using a quadratic approximation and show competitive performance with stochastic gradient descent, MCMC, and the Laplace approximation in terms of speed and multiple measures of accuracy--including on an advertising data set with 40 million data points and 20,000 covariates.
- Asia > Middle East > Jordan (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
SEGA: Variance Reduction via Gradient Sketching
Filip Hanzely, Konstantin Mishchenko, Peter Richtarik
We propose a randomized first order optimization method-- SEGA (SkEtched GrAdient)--which progressively throughout its iterations builds a variance-reduced estimate of the gradient from random linear measurements (sketches) of the gradient. In each iteration, SEGA updates the current estimate of the gradient through a sketch-and-project operation using the information provided by the latest sketch, and this is subsequently used to compute an unbiased estimate of the true gradient through a random relaxation procedure. This unbiased estimate is then used to perform a gradient step. Unlike standard subspace descent methods, such as coordinate descent, SEGA can be used for optimization problems with a non-separable proximal term. We provide a general convergence analysis and prove linear convergence for strongly convex objectives. In the special case of coordinate sketches, SEGA can be enhanced with various techniques such as importance sampling, minibatching and acceleration, and its rate is up to a small constant factor identical to the best-known rate of coordinate descent.
- North America > United States > Virginia (0.04)
- North America > Canada (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- (2 more...)
- North America > United States > New York (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > British Columbia (0.04)
SEGA: Variance Reduction via Gradient Sketching
Filip Hanzely, Konstantin Mishchenko, Peter Richtarik
We propose a randomized first order optimization method-- SEGA (SkEtched GrAdient)--which progressively throughout its iterations builds a variance-reduced estimate of the gradient from random linear measurements (sketches) of the gradient. In each iteration, SEGA updates the current estimate of the gradient through a sketch-and-project operation using the information provided by the latest sketch, and this is subsequently used to compute an unbiased estimate of the true gradient through a random relaxation procedure. This unbiased estimate is then used to perform a gradient step. Unlike standard subspace descent methods, such as coordinate descent, SEGA can be used for optimization problems with a non-separable proximal term. We provide a general convergence analysis and prove linear convergence for strongly convex objectives. In the special case of coordinate sketches, SEGA can be enhanced with various techniques such as importance sampling, minibatching and acceleration, and its rate is up to a small constant factor identical to the best-known rate of coordinate descent.
- North America > United States > Virginia (0.04)
- North America > Canada (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- (2 more...)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > Canada (0.04)
- Europe > France > Île-de-France (0.04)